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Loe vn ODUCT ION 


Wee GrPrent Tesearcie ci fory “av the Naval Postgraduate 
School is the investigation of the idea of a database 
kernel. It is proposed that the attribute-based data model 


and the attribute-based data language (ABDL) is used as a 


kernel EoummecUDPDOrTE mae lLationadl, alerarcnical, and network 
databases. A prototype software database system, the Multi- 
Backend Database System (MDEBSe which uses the 


attribute-based data model, is the target kernel system. 

The operations of the attribute-based data language are 
Mem eVE, INSERT, DELETE, and UPDATE, pie. telus primary 
Operations of any database management. One proposal is’ that 
additional operations be implemented in MDBS to provide a 
more complete database kernel. ia aes Ehesiss, we 
movestigate the addition of a sorting capability and the 
metational join operation. 

MDBS is a multiple-processor system. 2a interesting 
issue, when considering the implementation of the sort and 
wern Operations, is the distribution of functionality among 
Maes Multiple processors. In this thesis, we propose and 


Beate yze Various distributions of the functionality. 


In analyzing the issues of alternative Gist, lone re nee 
the functions, our approach will be to useuthewe wie 
functional units in MDBS. We propose alternatives saa 
evaluate them according to the design goals Gf 71935) 30a 
proposals require minimal interface changes among the 
Pune cional > ugisres 

We will approach the issues in the following Manner yaaa 
will make a number of proposals. We will analyze the tine 
complexity of the proposals. Then, based on tne MDBS Gece 
goals and the complexity analyses, we will make specific 


recommendations. 


A, o THE ORGANIZATIONS GR Ti anes 

In the rest of the thesis, we examine the distribducaiam 
of fUNeC Tota tine y Lor the sort and join operatiogee 
Specifically, Chapters II through V cover the Some) 1 uit 
Then, Chapters VI through VITE cover svn ose 

In Chapter II, we give a brief review of the eegre 
hardware and sotware architectures. In Chapter III, we 
present the general assumptions and notation used in 
analyzing the alternatives. In Chapter IV, we consider eva 
distribution of functionality among the controller Sangaaa 
backends. In Chapter V, we consider specific algorithms ie% 
introducing the sorting function. We also examine Thee eea 
where a particular sorting task does not fit thew ipr. 


architecture. We discuss how the sort serunerie. might 


incorporate features of the MDBS software architecture as 
well. . 

Gaaover Vivantroguces the join. In Chapter VII, we 
Seeotie: Ee — altertarive distributions of the join function 
among the controller and the backends. In Chapter VIII, a 
Specific Joln==alezorienm, tne Sort=match Join algorithm, is 
Beemimed in the context of MDBS. Finally, in Chapter IX, we 
moeri Ze our, conclusions and discuss the contributions of 


the thesis. 
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II. A REVIEW OF THE MDBS HARDWA ee 


a Se 


SOFTWARE ARCHITECTURES 


MDBS is a multiple minicomputer system that uses ous 
the-shelf hardware and special-purpose software in an inno- 
vative configuration to support high-performance ~datagaee 
operations and large-capacity databases. An overview of the 
MDBS hardware organization 1s shown in Figure 2.1. theeoaee 
ends and the controller, which are general-purpose mninicomn- 
puters, are connected by a broadcast bus. The controiaas 
Will broadeast each request to all backends at the same 
time. The backends process the request, and send the resume 
to the controller via the broadcast bus. Interconmunicariaag 
between the backends is also via the broadcast Dus. Every 
backend has its own dedicated disk drives. Reader “sheuma 


refer to (Ref. 1; 2, 3) fer moregderawm 


Ae. IMESTGN SCORES. WO Rasp a 

The major problem for conventional database systems is 
their inability to achieve high performance as the database 
grows and the rave Of requests increases. In order to over- 
come this problem, a high-performance multi-backend database 


system should have the following properties. 
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(1) The throughput improvement is proportiolalese ome 
number of backends. That is, if the number of bacxzenGdsua 
doubled, it should be possible to nearly double thewesiaze 
of the database without affecting the throughput. 

(2) The response time is inversely proportional to the 
number of backends. It should be possible to nearly halve 
the average response time by doubling the number of 
backends. 

(3) The system is extensible for capacity growtm Samaras 
performance improvement. By extensibility, we nean that an 
upgrade of the system can be made with no modification to 
the existing hardware and software, and no Siaiien 
disruption of the system) active, 

To meet the MDBS design goals, the controller 
implemented with the following goals. The amount of the work 
that the controller should perform must be minimized in 
order to avoid Comer oiler. bottleneck problems, 
Communication between the controller and the backends must 
also be minimized in order to avoid bus content lon ca. 
consequence of the controller implementation go0als) rie 
backends Ssaoudld do most of stnhe Ww ven Further, the 


communication among the backends must- be minimized. 
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Beoeeidee DBS SOPTWARE ARCHITECTURE 

MDBS is designed to provide for database growth and 
performance enhancement Deeeee maddrimon “Of identical 
Paehends and thelr disks, The Software architecture does not 
require the development of new software when a backend is 
added. In other words, the existing software supports many 
backends as well as a few backends. The software 
meomleecture allows replication of the existing software for 
the new backends added for expansion. No new software is 
developed. Reconfiguration is simple, and does not require 
extensive system regeneration. The software architecture of 
Mes 1s shown in Figure 2.2. For more detail, refer to [Ref. 
ee, 3). 


The software architecture also takes full advantage of 


the oarallelism in the hardware architecture. The software 
of the backends supports parallel processing Oh the 
database. There are three primary features which support 


ike parallelism. The first is the method by which the 
database PomeeceoscriloucGed over “the disk drives of the 
backends, 

The data model chosen for the system is the attribute- 
based data model [Ref. 1]. In MDBS the database consists of 
fires Of records. Each record is a collection of keywords , 
Optionally followed by a record body. A keyword is made up 


of an attribute-value pair. A record body is string of 


characters HOvum—uscdmmoy I MUBS for search “purposes, In 


14 


particular, the first attribute-value pair of Gace, cegm aa 
a file consists of the attribute FILE and) tes) sey ee 
its value. For performance reasons, records are toe tea ma 
grouped into clusters based on the attribute values@ana 
attribute value ranges in the records. These “values @eane 
value ranges are called descriptors. At databdase creation 
time, the database creator specifies a number Oi 
descriptors. These descriptors are called as clustegua 
descriptors that are used for forming clusters 62) )yecegne 
An attribute that appears in a descriptor is called a 
directory attribute. For the purposes of clustering ioe 
Chose keywords of the records which contain directem: 
attributes are considered. Such keywords of the record are 
termed directory keywords. 

THiS concept of ciluStermswcontampuge. Ee parallel 
processing in the following manner. The distributmenwes 
data across the backends is based on the concept of 
clusters. The records of a cluster are distributed: accross 
the backends according to the distriputroen algorithm 
proposed in [Ref. 1]. Therefore, each backend has a part of 
the cluster. Thus, each backend may access a portion of the 
data required by a request. All backends can work and access 
Cheir. portions ingeeamaliels 

The second feature of the software architecture which 
exploits the paralellism is the way in which directory acam 


1S managed. Every backend has its own copy of the ~clusSvemes 


IS 


The search for the descriptors related to a request can thus 
persnared by all of the backends. | 

Lice imucadvuiicmmmiime i supports sparalellism ais the 
method used Ole Semedulims requests and controlling 
econeurrent access to the database and the directory data. 
Fach backend keeps a request queue. Requests are scheduled 
independently, as resources become available. Concurrency is 
maintained separately at each backend with a locking 
algorithm. Thus, the backends work independently and in 
@aealilel. in exploring alternatives for the sort and join 
operations, we will preserve this idea of independent, 


parallel processing in the backends. 
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Lay 


vote RN Poe ODS inbeul ING THE SORTING FUNCTION 


ce emer mm ee 


When considering alternatives for distributing the sort- 
mee funclLiOon among the™processors of MDBS, we must consider 
both the hardware and software architectures. The hardware 
aga software architectures, as explained in Chapter II, are 
designed for distributing the functionality of the database 
management operations across the dackends. We must select 
an alternative that exploits the innerent parallelism of the 
architecture. Tne architecture of MDBS dictates een 
controller function, minimal message traffic, and identical 
software for the ovackends. The alternatives which we recom- 
mena should be consistent with the dictations made on the 
existing hardware and software architectures. 

oe hiimeeonsider the complexity of the sort function to 
include only the overhead incurred by adding an ordering 
meeeciiiCation to a RETRIEVE request, the time required to 
retrieve records is not considered. We will develop expres- 
Sions which represent the CPU activity, expressions which 
represent the I/O activity, and expressions which represent 
the communication activity on the bus, i.e., the computing 
complexity , the access complexity , and the communication 
Complexity , respectively. 

When analyzing complexity for functions distributed 


accross the backends, remember that the backends are working 


rs 


parallel. The result of this distribdution Of werk a2 ee 
backends operating in parallel is that the linear complex- 
ity, the sum of the work done at all B DacKends) = as) oan 
to an effective complexity, the work (pequaisedila eee 
backend which does the most work. ASSUMpTlOn 5 1S Jthav eae 
number of blocks to be sorted is evenly distributed acuae 
the backends. Therefore, Since each backend wil s0Ge 
equal amount of work, the effective complexity is equal to 


the complexity at any one backend 


A. ASSUMPTIONS 
In each case, we will analyze the worst-case complexity 
of the current alternative. In order to Simpl) S37 
analysis, we make the following assumperornse 
(1) Internal sorting only is eonsidered, due to memome 
Lim paw otse The backends are currently 16-bit 
minicomputers with a fixed, 32 K-byte address spacer 
Therefore, memory lLimitatron is areal pment one 
(2) All records in a block are to be sorted igi wm 
Select ron sot Serecerds is performed by record process 
before sorting). 
(3) Sorting is block-by-block (i.e., a block Gf “regemae 
selected by the record processing function 1S passeamme 
the sorting function, where they are sorted and Store ame 


the secondary storage for merging). 


(4) Merge is 2-way. This is the Simplest case. We will 

consider K-way merge in Chapter \V. 

(5) The number of blocks to be sorted 1 evenly 

distributed across the backends (i.e., if there are ™ 
- blocks to be sorted and B backends, then each backend 

ports M/B blocks). 

Goo woome sorting aleoritnhms of the order ( r*log r ), 

where r is the number of records, will be used. 

(7) Records are sorted on a single concatenated key (i.s., 

only a single execution). 

meemiae time to send a block of data across the broadcast 

bus 18S an average time, which will be represented as a 

eomstant. 

(9) The time to read (or write) a block of data from (or 

to) the disk is an average time, which will be represented 

as a constant, and is the same for the controller and the 

backends. 

(10) The CPU time required for a comparison operation is 


the same at the controller and at the backends, 


B. NOTATION 

In analyzing the time complexity, we will deal with 
Variables which represent the number of backends, the number 
Sumer ecords to be sorted, the number of records in a block, 
eee, NC Will also deal with certain constants. For example, 


according to assumption 8 above, there is some constant 


which represents the time required to send a block of data 
across the broadcast bus. For uniformity, Wwe der inoue 
following variables and constants to be used (Cnhrougnem pea 


ainaliys1S'. 


Cle the number of backends in the system. 


(2) N = total number of records to be SOpt@eqieEE 


particular requesre 
C3) r= the number of records in a block. 


(4) b = the number of blocks to be sorted at the oackemne 
Note that according to assumption 5 ; bD=N7 (3 70s 


Simplify the analysis, we will assume that b iS a power 


Oa, 


(5) log : stands for logarithm to the SasSe 2 —UniRee 


otherwise noted. 


Ce -SYNTAX FOR THE SCR Pune wren 


The syntax of a retrieve request in MDBS is as [oUhor se 
RETRIEVE Query Target-list [BY attribute] {WITH pointer 


That is, it consists of five parts. The first part S) eae 
name of the request. The second part iS a query )woues 
identifies the portion of the database to be retrieved =a. 
Target-list is alist of elements. Eack element 1s Ctraen 


an attribute or an aggregate operator to be performed on an 


ail 


Pemploubewm i Me bourtn parte of the request, BY clause, is 
Bottonal. it déscribes the whole alternative ot the 


attribute such that BY DEPT means every department in the 
eeiveabase. Thewtiftme pert of the request, WITH pointer, is 


meso. Optional which specifies whether pointers to the 
meprteved records must be returned to the user or user 
program for later use in an update request which is out of 
pee oncerns for sort function. 

fouper ror mune SOomumrunction, we first need to “retrieve 
tne records that “aaes relevant to the user request. 
Therefore, modified retrieve request can be used as a syntax 
Memclie Sort function. 

With modified RETRIEVE request, we may consider’ two 
mie trerene alternatives for a syntax to implement the join 


monmetion in MDBS: 


1) RETRIEVE Query Target-list (ORDER _BY (Attribute list _1)) 


BORETRIEVE Query CORDER BY (Attribute list 1), 
(Attribute list 2)) 


In both alternatives, the first two parts are the same 
as in regular retrieve request. In the first alternative, 
Merpget-list clause consists of the attribute names with 
moeoma the result of the sort function 1S given to the user. 
Bien BY clause defines the function to be performed on the 


retrieved records. Paae e ae UC eet Sivan defines the list of 


Ze 


attributes’ names with which the retrieved ~records seo. 
sorted. If there are more than one attribute name in the 


Attribute list _1, then it be assumed that the Order eot ee 


ae Piao Les in attribute list 1 gives ie order “em 
implementation of consequitive sort func Gate on the 
retrieved records. Attribute list 1 may contain either 


Directory Attribute(s) or non-directory attributes on —epemee 
The Lp Oe eine po ime is tnat each attribute in 
attribute list 1 must be an attribute that Chen ceGwe 
retrieved from database obtain it. 

In the second alternative, Atribute list 1 ineludesiiaae 
attribute names with which the record are sorted. {he orgea 
of performing the sort function on the records iS again G7 
same as the Order On, the attribuces given aka) 
aU OG Colle Sei 

Attribute list 2 includes the attribute names with Waake® 
the result of the sort function is given to the Sena 


other words, it can be thought as a target-list. 


eo 


ee oor Dione ou ron OF EUNCTIONALITY 


a rm mm mee SS 


omecia@alyzune them distrabucion of function, we will 
assume that the sort function consists of two phases: the 
Miecrial SOrL phase andethe Merge phase. pecauses, Of eemain 
memory limitations, we require that the records first de 
sorted block-by-block. The sorted blocks are stored in 
temporary storage in the secondary memory. This is done by 
meeeweitcvermtal sort phase. Seon uetmmoleckms will =tien be 
accessed from the secondary storage and merged. This is done 
Pyevune merge phase. tee ico mmcomole ity Ot Ft iese “UWo 
processes will be shown separately. At the end of the 
analysis of each alternative, the total time complexity will 
be given. 

We Wa consider three alternatives regarding 
MepoteriOucion of function through the system. Since MDBS 
Memslists of two type of functional units, namely the 
Bemrcroller and the backends, the possible distributions of 
Pumetionality are the following; 

ieetne controller performs the sort function. 
B. The backends perform the sort function. 
mm the controller and the backends share the SOret 


munecELON. 


We will analyze these three alternatives in detail “in tne 


FOl lowing secon. 


A. THE CONTROLLER PERFORMS THE SORT EUNGare) 

In this alternative, the backends perform no “addition em 
fumes LOns . All of the sorting is done at the control lems 
The backends perform the selection, projec ulomr ana 
aggregation operations specified in the RETRIEVE requiem 
and forward the result records to the controller. iim 
controller accumulates the result records from all of the 
backends, and sorts them in the order specified in the 
RETRIEVE request before forwarding them to the requester. 

There is no change in the functionality of the backends 
Therefore, no modification of the software of the backenge 
is’ Yrequired. However, at least two processes in the 
controller will require modification. First, the s@equees 
processing process must be augmented to recognize the 
ordering specification of the request, and to forwa.’d che 
ordering specification to the post-processing proccsce Tha 
post-processing process must be augmented to recognize that 
sorting is required, and to accumulate and sort resi 
records for a request according to the proper ordering 
Specifireation. 

First, we assume that all blocks for a query have been 
accumulated and stored in the secondary storage of the 


controller. The controller will have (B*b) blocks (omc e nee 
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The meena soereonasem=tor ecen block will require 
O(r*logr) time, where there are r records per block. The 
Poca —cCOompuLing Complexity for the internal sort phase time 


aoe cen , 
OC -B*beualosc r ). 


2*B*b accesses to the secondary storage are required 
i merl ome emiaumsOG ub Plase. oO, The access complexity 


of the internal sort phase is 
Ceiba 


Simceminere sare (S*b)sblocks at the controller, log(B*b) 
will be the number of passes over data for merging. Eacn 
Bess Will require (B*b*r - 1) comparisons. Son the 
computing complexity er the merge phase will be 


(log(B*b)*(B*b*r-1)), which is in 
0 ( B¥b*r*llog(B¥b)| ye 


2*B*b¥log (B¥*b) accesses to the secondary storag2 are 
required for merging, so the access complexity of the merge 


phase is 
O( B¥b4log (B¥b)] ). 


Tiecrer epee Worst-Case Computing complexity for the 


mere function is 
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OC B*b*r*( Woe” (B* be eer 


OC NS log ima 
and tne access, Comore cn. ass 


OC B*>* log Ceainpme of 


OC Gr? Gena is 


In this case, since all sorting and merging is doneuweae 
one processor, the controller, the effective complexities 


the linear complexity are the same. 


B. THE BACKENDS PERBORM TibSSOn  eauiiG ine? 

Here we consider two strategies. In the first, ae 
the backends share the internal sort phase, and the merge 
phase 1s performed by one or two backends. In the second; 
each backend sorts and merges the blocks of data resident at 
that backend. The backends then share the work of merging 
with B/2 backends performing the first partial merge, B/4 
backends performing the next partial merge, etc. "DStiiae 
examine each of these Strategies in datail. 

1. All Backends Sort, and One or Two Backemo smear 

In this alternative all backends perform the 
internal sort phase individually. After the internal sore 
phase 1s complete, one or two predetermined backends 
complete the process by merging all of the sorted blocks. 


50 each backend sorts b Dlocks of 7 ig oceiges The 


computing complexity of the internal sort phase atvweoge 


al 


backend 1s 
OCP Sree i) | 


and 2*b accesses to the secondary storage are required, so 


Maye access complexity of the internal sort phase is 
OG be 


Mmueermls tie effective complexity for sorting. Since tne work 
of sorting is shared among the backends, we use 113 
effective complexity in our analysis. 

lex o seetiewssorved Slocks. Or Gecords Must de 
Meansmitted along the broadcast bus to the one or two 
Baexends which will perform the merge phase. Petes Seaice 
the case where one backend does all the merging. Then, if 
maere are 8 backends, (B-1)*b blocks must be transmitted. 


ie, communication complexity is 
O(B*¥b). 


Also, B*b accesses to the secondary storage are required to 
Beore the transmitted blocks at the backend assigned to 


perform the merge phase. This requires the access complexity 
OP Bta 


The backend selected to perform the merging now has 
meee blocks. Merging (B*b) blocks at the backend requires 


the time of (B*b*r-1)*log (B*b). The computing complexi'y is 
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O( B¥b*r 4tog(B*b)l) , 


Since 2*B*b*log (B*b) accesses to the secondary storage are 
required, the access complexity of the merge pohase at tnis 


backend is 
O( B¥b¥#llog (B¥D)I ). 


Therefore, the total computing complexity for tne 


Sort funcetongacs 

OC b*¥r* (log r + Blog (3%b)I) De 
the access complexity is 

OG b*B*too Crp). 


and the communication complexity is O( B*¥b ). 
2. All Backends Sort Separately and Share ogee 
In this strategy, all the backends, as 11 fw,e 
previous section, share the work of Sorting. Theretonemmae. 


computing complexity of the internal sort phase 18, agauae 


the effective complexity, 
OC b*r* legen )2 


and the effective access complexity is O( b ). 
Then, each backend performs the merge phase over its 


own b blocks. This requires the computing complexity of 
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O¢ b¥r¥llog b! ) 
piamemie access complexity of 
O( b¥log bl ). 


Next, the merge phase is shared by tne backends it 


miewmanner snown in Figure 4.1. First, B/2 backends p2rrorm 
a merge pass, each merging 2*b blocks. Then B/4 baskends 
perform a merge pass, each merging 4*b blocks. This process 


imeerepeated log B times. Now Vet us look at the computing 


complexity of the merge phase step by step. 


ies ueD ( 2*b¥*r = 1 )* log (2) 
2. step ( 4¥p¥r = 1 )* log (2) 
3. step Ceseptr = 1 )* Meee) 
4. step (eine a ae ove 12 
5. step ( ee ceo. 2) 


32%b*r - 


_ 98 
llog B} step ( 27 *b¥r - 1 ~+)% oe mee) 
mee EXpression for the computing complexity of the merge 
phase, then, as derived from the above, is 


[10581 
Ci Din ienee as 


meain, tnis is the effective complexity. 


30 


At each step, each target backend first “Stores maa. 
blocks transmitted from their neignbor backends belorecuea. 
merge phase starts. This requires the access complexity 

lag 
OH Neves 2 Ve 

Since the merge phase is performed in log B Sees 

the access complexity of the merge phase is derived as the 


POMwEG Wane 


\ce"step ” © 2° b = crere 
2. step) © 4*5 eater, 


3. Step. (ode be worm) 


r 
{ 10Q 
log 8B step “2s 4b ioe 


> 


Therefore, the effective access complexity of sharing ths 


merge phase for this alternative is 


S| 


Ml. 
OC be Ce 2 ar 


At each step, one half of the total number of blocks mustwpe 
transmitted over the broadcast bus to the target backends 
for the next step. Since there are log 38 Steps 


communication complexity between the backends is 


Ox B*p* |log Bl). 
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Figure 4.1. Performing the Sort Function Step-by-Step 
at the Backends 


C. THE CONTROLLER AND THE BACKENDS SHARE THE SORTS eo 
We examine two strategies for distributing the “serve 
function between the controller and the backends. Tiew iiaee 
strategy is that the backends perform only the internal sort 
phase, and the controller performs the merge phase. The 
second strategy is that the backends perform the internal 
sort phase anda partial merge, merging all of the recomae 
in the blocks Stored at that backend, and the contro iuiizes 
completes the merge oprocess. Let us examine each of the 
Strategies in detail. 
1.  Backends Sort Block-by-Block and Controller Menezes 
Every backend performs the internal sort phase aa 
its part of the file. Each block is sorted and forWameee 
directly to the controller for merging. The time complexiim 
for the internal sorting of a block is O€ r log Yr ) ) imei 
Chere are r records in 4a “sloek. The effective computin, 


complexity of the intermeal sore piacesr. 


OC b*r*loe tae 


where there are b blocks per backend. 2*b accesses to the 
secondary storage are required, so the effectivesacce 


COMmp Ley ii is 


OG: Toa 


The sorted blocks are sent to the controller via the 


broaccasy ous. This communication cost is included iii 
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Sesmeot Gene TRIEVE Operation, and 18 not an overhead cost 
mor Sorting. However, the controller must’ store (B*b) 
DineeKs Derore tvne Merge  olase Starts. Tnis requires the 


access complexity 


CIC Eas 


Pieweonurollere=now Nas BD Dlocks to be merged. The 
computing complexity for a 2-way merge is (log(B¥b)*(B¥b¥r- 


mo. which is 
O( B¥b*r* log (B¥b)| ). 


O*B*b¥log(B*b) accesses to the secondary storage are 


required, so the access complexity for tne merge phase is 
r 7 
O( B*b¥*ilog (B*b)} ). 
Seomuie COMmoOUuuune COMpPlLexity tor this alternative 1s 


OC b¥r*log r + B¥b¥r* /log(B¥*b)| ) or 


[_ 
Ord r* Cy log : + log(B¥*b)| oe 
ela tne access complexity is 
O( b*B¥ [log(B*b)] ). 


Par ime Backends Sort and Perform a Partial Merge, and 


en ee el 


In this case every backend sorts its part of the 


requested file, and ‘the controller merges those partially 
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sorted file parts being sent from every backend. Since the 
backends perform the internal sort phase block-by-block, the 


effective computing complexity of internalesorvume ie 
O@ b*r2te2 ne 


Assuming tnat every internally sorted block is stored back 


into the secondary storage, the access complet je 
O\Cenisam a: 


Now each vackend merges the sorted blocks resident at that 
backend. The number of passes over the data required for 
the merge phase is log ob. Therefore, the effective 
computing complexity of merging b blocks at the bDacKenGiaim 


(bse 1) * og @be vor 
O( b¥r*/log b]), 
and the access compbexwty is 
O¢ b¥ |log b|). 


So, the computing complexity for the internal sen 


and merge phases at the backends is 
O( b¥r*® (log r + log b) ), 
and the access complexity is 


O( b* log bi). 
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Zomuebeavion Of tnese blocks to the controller 1s, again, 
not cau Of Btnes sOrvuings cOsL. | Fowevers since the 
transmitted blocks are to be stored before the merge phase 


Suarts, this requires the access complexity 
eC Boe 


IMeSCOnCrOlven, OW, Willi haves 8 tins of sorted 
mecords to be merged. Ihe logarithmic value of the number of 
backends gives the number of passes over data, log 68. So 
the Coneur ing COMPPExXity Or Ptne merge onase at the 


mentroller is 
O¢ B¥b¥r*/ log Bl), 
and the access complexity is 


O( B¥b* llog BI ). 


Pee EVALUATING THE ALTERNATIVES 


[meeetie —Orevilous sections we have Diesen ved five 
alternative Sisvuy your ons of funtionality between the 
Bentroller and the backends. In this section we will 
analyze the tradeoffs of the alternatives. Table 1 


Summarizes the computing complexities of the internal sort 
and the merge phases, the access complexity, and the 


Semmunication complexity for all five alternatives. 
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Piper Mative A Pepresenusmeune » dlStribution of function 
PaeosenredmetGmoceo1On AmmmeCummnaiS @mapter. The @ontroller 
Genrorms all of the sorting” and all of the merging. 
Miternative B.1 represents the distribution presented in 
meron 5,1 Of this chapter..All of the@= backends perform the 
Beomting , and one or two backends perform the merging of “ne 
memeeod blocks. Alternative Bla represents “the distri>ution 
Seer unclLion presented in Section B.2 of this chapter. Ail 
backends perform the sorting and share the merging. 

PabeGiavlve c. | yrepresents the distribution of “function 
Beesented in Section C.1 of this echapter. All the backends 
merform the sorting and the controller performs the merging. 
migeeeay., alternative €C.2 represents the distribution of 
mumeeclOn presented in Section or this chapter. Backends 
peers and perform a partial merge, and the controller 
performs final merge. 

Tne complexity formulas of those both accesseS f0 fhe 
secondary storage and block transmission are given only fOr 
the additional accesses or transmissions necessary ie. 
meqremete the sort function. In other words, accesses to the 
pee@ondary Storage to retrieve the records in order to 
perform selection and projection before the sort function 
eearcs, and transmission of the blocks from the backends to 
the Sontroiwhker are not )usaedtuded: In general, each 


alternative, except A, has the same time complexity with 


Be 


regard to the internal sort phase. Therefore, we will focus 
on the other columns in comparing the alverpnava7 

First, let us examine alternative Ae where tne 
controller performs all sorting and merging. The Compmpaee 
complexity is O(B*b*r* log(B*b) ) for sorting and) memeummae 
(B¥*b) blocks of r records. As easily seén, this al Gergen 
is contrary to the design goal of the minimizing contr aaa 
function. Therefore, we will eliminate it from tUurggeg 
eonsiderations. 

Next, let us examine alternative 8.1, where all backends 
berform the sorting and one or two backends perform tne 
merge. The backends perform all of the work of sorting 
merging. Even though this alternative seems to meet the 
design goal of minimizing the controller function, | ieee 
contrary to the second design goal of minimizing the message 
traffic between the backends. The communocation com_vlexivty 
is O(B*bd) for (B*b) blocks. Clearly, for queries involvame 
a large number of blocks, the communication overhead will be 
high -and busS congestion may result. Another disadvantageous 
that a single backend performs the merging. Also, when the 
Single backend is performing the merging, it may delay the 
processing of other queries, thus causing a decreases 
SYSEer) “En our tote Because of the communication overhead 
and the potential for decrease in throughput, we will also 


eliminate alternative B.1 from further considera aiaae 


oO 


Next, we consider alternative B.2, where all backends 
share the sorting and the merging. The communication 
complexity is O(B*b*log_ 8B) POG (B¥b) blocks. The 
communication complexity increases logarithmically with 8, 
EMm@emndumber Of backends. Clearly, this alternative is also 
Semurary to design goal of minimizing the message awe 
between the backends. Also, the computing complexity for the 
merge phase and the access complexity increase exponentially 
By log B, where B is the number of backends. Clearly, with 
this alternative, increasing the number of backends will 
cause longer response time and decreased througnput. So, we 
Peele 10b consider B.2 to be a desiradle distribution of 
mumection. 

This leaves us with alternatives Cal and Coren. 
imeeermative C.1 is that the backends perform the Sorting 
bloek=by-block and the controller merges all the blocks. 
Alternative C.2 is that the backends perform the sorting and 
a partial merge, and then the controller performs the final 
merge. Neither alternative incurs transmission overhead. 
iaerefore, the design goal of minimizing the bus traffic is 
niet. 

In both alternatives the work of sorting and merging is 
shared between the backends and the controller. Alternative 
C.1,however,does involve more work for the controller than 
alternative C.2. Since the backends perform the «ain 


memeion of the merge process in €.2 the controller's work is 
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reduced. On the other hand, the workload of the SaekedG cum 
greater with alternative C.2, than with alternative Col 
fet us analyze these two alternatives with respec. tome 
design goals of minimizing the controller ftine@tdomees 
maximizing the work done by the backends in the next 


Seetilon. 


E. COMPARISONS BETWEEN AME RN Agro VE Sates (eee 

In this section we will compare the two alternavivers 
namely C.1 and C.2. In comparing these two alternatives aye 
will analyze computing complexity and access complexity 
separately. Since the time to do one disk access is much 
longer than the CPU time to perform one comparison, separate 
analyses will be more meaningful. 

As is shown in Table 1, the internal Sort Jo3ses 
computing complexity is the same for both alternatives. 
However, with alternative C.2, the backends perform a part 
of the merge. Consider that, for both alternatives, ii Same 
number of blocks is held constant, increasing the number 
backends will cause the number of blocks to be sorted av joge 
backend, b, to decrease. This decrase is linear work respect 
© the number of backends. Therefore the computing 
complexity on the backends decreases linearly with an 
increasing number of backends. 

However, meeting the amount of work done by the 


controller’ function is clearly less with alternaviy cue 


4 


Miser uialternative C€,lymdue to the fact that the Dackends 
offload some of the work of merging from the controller. 
Consider the case where the total number of recors (N=B*b#*r) 
imoeneid costant. The computing complexity of alternative C.’ 
will not vary with the number of backends. For the 
peecernative C.2, the computing complexity of the merging at 
mae controller will “increase logarithmically with the number 
ot backends. However, the computing complexity for merging 
at the controller will always be less for the alternative 
meeetnan ©.1 by a factor of (B*b*r*log »). Since b decreases 


as B increases, the gain will be proportionately smaller as 


B grows large. Clearly, however, the alternative C.2 better 


mes tne goal of Minimizing the controller RUC Bron 
meat ly, a substantial reduction in the controller workload 
foe 6 result©6mfrom€6casSigning more functionality to the 
backends. 


Now let us examine the effect of increasing the number 
Of backends. We will analyze the computing complexities, 
access complexities, and communication complexities of both 
mieternatives. Let us examine the case that the total number 
mere cords, N=B*b*r, and the number of records per blocks, 
meeremaining constant, while the number of backends, 8B, 


inereases. 
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For alternative C.1, the total computing .cemo =e ee 


D*r*log fr + B*0 tr fos ce a0 a mmer. 


N*( log N = ((B-1)/B)*log r ), Since = e21) eee tem 


This obviously yields decreasing resultsS for increasaaae 
values of B. This reduction,however ,will have minorwetpegs 
on the result of the computing complexity. FUrther see 
reduction can be ignored for large N values. 


For alternative C.2, the total computing comolexi yam 


b*r*log r+ b*r*log 5b + B20" log aera 


N*( (C1/B)*log N + ((B-1)/B)*log B ), since ben Ga ae 


(1/B)*log N obviously decreases with the increasing 8. 
However, ((B-1)/B)*log B increases for increasing B values. 
There is some breakpoint for where the effect of the 
decreasing term has more effect than the increasing term. 
Let us assume that we double the number of backends ii 
difference in the total complexities between the case tnat 


the backends are not doubled and the case that do doubpiteuane 
N¥*C (1/(€2*B))*log N -(1/(2*B))*log B =<-((2*B-1)/7(2"2 eee 


As long as the condition, log N > log B + (2*B-1) > @hnotdee 

the total complexities will be reduced. So, if the com@itieg 
28-1 

N > B*¥2 holds after doubling the number of backends, then 


the computing complexity deemeasese 
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Now, let us examine the access complexity. ror ~une 


alternative C.1, the total access complexity is 


b + B*b*log(B*b), or 


eel ean) ) + loge  s.since b=oN/(B*b)< 


Mecariy, this complexity decrease as B increases. However, 
mae decrease still Nas minor effect, especially for a large 
N. 


The access complexity for alternative C.2 is 


molog. D> + B~b* log IBeor 


evr (1/8) *(log Neewiiog r) + ((B=9)/B)*log 8). 


meen (1/B)*(log N = log r) decreases , but ((B=-1)/B)*log 38 
mmereases as B increases. Let uS again assume that we double 
the number of backends. The difference in complexities going 


mreom 8 backends to 2*B backends is 


CiveapeGue@ivees*=B))*( log Ne=-log@B -log r -2*8B + 1)om). 


Memos as tne condition, log N > log B+ log r + 2*B-1, 
memes, ii tUne total access complexity decreases. So, if the 
28-1 


condition (N/r) > B*2 holds after doubling the number of 


meewenas, then the total access complexity decreases. 
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Figure 492 "smllustrawes “eae computing and access 
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\ 
complexities for both alternatives for N=2 and NS2eene 
r=64 as B increases. As is easily seen, alternative C.2 5 
always better than alternative C.1 for meeting the design 


BOals 2oO1 "MDBs. 


F, RECOMMENDED DISTRIBUTION OF FUNCT Ge 

In the previous sections of this chapter” Wee 
analyzed the alternatives of the distribution of the 
funetionality and shown the tradeoffs and the advantages yam 
each one. Briefly, alternatives 4, 8.1, and B23 
contrary to the design and implementation goals of MDBS. Tas 
other two alternatives, C.1 and C.2, are pertinent fomeoum 
eoncerns. 

At each comparison for alternatives C.1 and C.2 Jee 
previous Section; we have shown that C.2 is be (Gem 
alternative for large number of records. Therefore, we 
recommend that the functionality be distributed 14507 
following manner: the backends perform the soOrULi nga 


partial merge , and the controller performs the final merece 


mmm we ee 
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foe Lee NT  AECORDEHMS 2On THE SORT AND MERGE PHASES 


In our previous analyses, we made the assumption that 
records are sorted one block at a time, uSing some well- 
Plewo Sorwmne algorithm with time complexity of O(r*log wr), 
mime r ais the number of records in a Slock. In this 
chapter, we will examine the effect of sorting records 


blocks at a time, and the effect of uSing a K-way merge. 


Pwmeeo ORL ING WITH n BLOCKS AT A TIME 

PC ee oaSo men oGi@ em DIOCKS al ambimem.  nereévare two 
Mees eeto Consider. First, if the sorted blocks are stored 
back into the backend's secondary storage, our analysis will 
be the same AS cle ump revIOuUsmNOmes except  tiar tne 
Goefficients of the computing complexity formulae will Dde 
meomportional to n. 

nae computing complexity for n=blockS-—at-a-Ctime sorting 
ms O(n*r*log n*r). This process will be repeated b/n times. 
Therefore, the effective computing complexity is O(b*r*log 
mer). The access complexity remains the same, O(b). Since 
there will be b/n runs to be merged, the number of passes 
Over data becomes log (b/n). Computing complexity for merge 
phase, then, will be O( b*r*log(b/n) ). Access complexity 


for the merge phase is O(b*log(b/n). 


48 


Table 2 summarizes time complexities for Doth Slee .=as 
Dlock and n-block-at-a-time algorithms. As is easily seen, 
the computing complexity “for @sorr mae n-dlock-at-a-time 
algorithm is (b*r*log n) times that for sorting Slock=oges 
block. However, the computing complexity for méer2ein 2 wipsuee ees 
by (b¥*¥r*log n), and the access complexity is less Dy gaeamee 
as. 

Figure 5.1 shows the vettecceen ine eaeine the number iem 
backends on the throughput of the CPU when sorting n-block-= 
at-a-time. The x-axis shows the numoer of backends, and y- 
axis shows the computing complexity of sorting and meraaae 
at the backends. The y-axis is shown with log scale. These 
values were derived as follows. The complexity formulae are 
expressed in terms of N, the total number of records "TOu—iee 
sorted, and B, the number of backends. The compu 


complexity required at the backends is 


(O*r*log Cae) ae (gemsven eel) or 


(N/B)*(log N = log B), since b=N/(B*r). 


Using various values of N, varying B from 2 to 16, we “armimee 


at the curves shown in PF eetiressae 
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Boe ldee S-WAY MERGE 

Nee COwEhIS pOimt, we Have utilized 2=way smerge process 
for our analyses. As we recall, the time complexity of merge 
fmomaominated by the number of runs, i.e., the number of 
blocks which are already internally sorted. The logarithmic 
Value of the number of the runs gives the number of passes 
over data. In 2-way merge, the number of passes is the 
meearithm base 2 of the number of runs. If we increase 
order of the merge to k, for k-way merge, the number of 
Mesoses Will be the logarithm to the base k of tha number of 
ns . 

Let us examine what we gain with this reduced numoder of 


passes. We assume that all runs are of equal length. The 


moewation used is: 


Me Se emmiumber Of Runs = b/n 
log x = logarithm base 2 of x 
POG = hoo anit nim base cmon x 


mersee Oo: ali, our access time will be reduced. 
In 2-way merge, access complexity is 
O( DF ilog rl). 
In k=-way merge, access complexity is: 
O( b* LOG Rl). 
mpeuire 5.2 gives us some information about the reduction in 
meeess complexity for a fixed number of R, and a fixed 
femoer Of blocks, b, as k increases. The X-axis shows k, 


where k is the number of blocks merged at one time. The 


ae 


y-axls is scaled aS maximum 1, where “Selo eoe 
complexity for the 2-way merge. The ratio of k-way merge 
access complexity to the 2-way merge complexity is graphed 
here. For instance, the access complexity for a 4-way merge 
is one half of that for a 2-way merge. 

On the other hand,of course, increasing k will (e@eeaee 
the computing complexity, or time that is necessary 
compare the values. In tne 2-way merge, the computing 
complexity is 


O( 5 D*e ioe Ri). 


In K way merge, CONpULINEwecompine i.) wins 


O( b¥r*(k=1)* Loc at) or 
OC b¥r*®¥(k-1)* (log R /log «| = 


Since LOG(R)= logt(k) 7 Vogue 


Figure 5.3 shows the increase in computing compe xa 
with regard to k for a fixed R. The y-axis for, this {ormeage 
is scaled starting with minimum 1, where 1 represents’ the 
computing complexity for a 2-way merge at the vdackend. The 
ratio of k-way merge computing complexity to the 2-way merge 
computing complexity iS graphed here. “or instanceea a. 
4~way merge computing complexity is 1.5 times that of the 


C-way merge computing complexity. 


D5 


Momocenw i rON FPagure Seaeeecae access CcovlIpnley*ty reduces 
rapidly up to 40% of the 2-way merge complexity at k=6. 
However, at this point the computing complexity has doubled. 
After this point, k>6, the reduction in access complexity 
becomes negligible relative to the increasing computing 
Somplexity. Therefore, we can take the point, k=6, as an 


implementation point for the degree of the merge. 
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foe TANG THE SOPTWARE ARCHITECTURE OF MDBS 

Hee LOmunIisS point we have not considerd hcw the existing 
meatuecs Of MDBS Software arenivecture mit ite be otilized for 
the sort function. We may ask a question such as whether the 
meseriptor and cluster information ean be used to improve 
mperving’: Another question is whetner existing I/0 mechanisms 
ean be used to support the temporary storage requirements. A 
tnird question is whether an alternative strategy snould be 
adopted when the number of records are not evenly 
Seestriduted across the backends. We will examine these three 
questions in detail. 

ooh izing tie Desertpeem=and Cluster Information 

Recall that the database in MDBS 14 organized into 
emusters. Pacimecobuiciterme has. a qumiqucmmcius t cr Midis eeand 
eee-ociatved with a unique set of eluster ids. A record 
metrongs to one and only one cluster. The cluster to which a 
memord belongs is determined by the set of deScriptor ids 
momen can be derived from the directory keywords of the 
meocord. 

HOvmeiieties Cis nelips US in SOrting? First consider 
the case that the primary(first-listed) attributes in the 
Sroering specification are not directory attributes. In this 
G@ase, the cluster to which a record belongs has no bearing 
wmeecne final sorted order. 

Next consider the case where the ettributes in the 


Seeering Specification arewall directory attributes.ein this 
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case, we can use cluster information lI053tne See a 
manner. If we also know the relative order of the descriptor 
ids which determine the clusters, we may simply concatenate 
the records from the cluster havins the lowest ora 
descriptor ids with the cluster having the 1ext nisher sora 
descriptor ids eandmscoucar 

Finally, consider” the case. amere] the primMalea 
(first-listed) attributes in the ordering specificationmeme 
directory attributes, and the secondary attridutes  1neae 
ordering specification are non=directory atuecrougaam 

Let us first take a look at what we may neCcue 
utilize the existing machanisms. What is useful for sortie 
process is to know cluster ids and consequently the groupies 
descripter ids(DIDs). The necessary point is to know the 
DIDS associated with records. If the record bproceSeaims 
informed with the DIDs of records as we!'l as their addressee 
and also the records are retrieved in terms of cluster 
numbers, that is, there is no record retrieved belongs igG 
another cluster till all the records belonging to a clustem 
are retrieved. This process guarantees that 1f tne recone 
are going to be sorted with an attribute which 1s > dinmeeieue 
table attribute, and if the attribute 18S either ty peu 
type B attribute then none of the clusters will 3a ee 
record with the same attribute value. We also need another 
process to define which cluster has less or larger value of 


attributes. This process needs to check DIDS of seluspeme 


from descriptor-to-descriptor-id table and gives a list of 


Dabs. 

Nema, COnstdereE ie =e hil Ly Or une  eag@OVvVe Cases. 
erst on all, we need to implement three different 
algorithms to handle these three different cases. Second, 


Bmocaollty that primary sort Specification attributes are 
memerecvory attributes is unknown. 

Let uS assume tnat the system will be augmented with 
maerimolementation of the cluster information. In that case, 
modifications to MDBS are to be done. Recall that the record 
peoecessSing knows only the addresses of the records to be 
metrieved., Therefore, record processing is to be informed 
maemo not only cluster info but also descriptor information 
from directory management, including relative ordering of 
clusters based on descriptor ids. We do not nave a 
mechanism available to support the idea. On the other hand, 
pais implementation violates the information-hiding 
principles upon which directory management is designed. 

Tae Pee oki Stine Mechanism tor Botoring temporary 

Data 

In the previous sections we have assumed that the 
System was providing the temporary storage requirements for 
the sort function. We have not considered about how this 
might be accomplished. We know that system allocates tracks 
as required for new clusters or for extending existing 


clusters. Therefore, we know that there exists a mechanism 
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for allocating storage. The difficulty ~ Pies aie eee 
allocation is related to the concept of a cluster) andes 
tomas DlOock" = eraaara. 

In order to use the existing mechanisms, then, we 
must establish some relationships between blocks of sorted 
data and clusters. Since we are sorting Dlock-at-a-time, we 
initially need to establish as many temporary clust¢rs acme 
have blocks of data. Tnen, with each succesive pass Of Ga. 
merge algorithm, we will require only half the previous 
number of clusters, although the total space reqiuammem 
remains the same. 

In current MDBS, storage is allocated only —17niaae 
case of an insert request, where the records 15 (ome 
inserted into an already-full cluster or a new cluster Dame 
be established. The list of available (free) seconde 
storage addresses is maintained by directory management. Hew 
addresses for new clusters are assigned during the address- 
generation phase. 

The second consideration is th-t addresses are 
associated with specific clusters, and new cluster ids 
assigned only by the controller. The third consideration is 
that records are -inserted record-at-a-time, based on an 
insert request. For the sorting process, we Wish tO ))7aaee. 


bikeelcs Mo faire cords 
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ieerder “fo tise Tieommerorctlno = Mechanisms, we must 


modify MDBS so that 


(1) The sort process can request a new temporary cluster. 


[it Swe —envOomye Sendinemea message LO the controller. 


(2) Directory management can generate addresses as required 


for the temporary clusters. 


me kecord processing can insert blocks of records as well 


domo ingle record. 


(4) Temporary clusters and their storage can be freed when 
no longer needed. 
iars 1S a disadvantage due to extensive modifications. 

As an alternative, we may consider the following 
case. Reserve a certain number of addresses as temporary 
storage at system setup time. Use these addresses and tne 
low-level read and write functions of record processing for 
memeorary Storage. 

melitcecase Enat hecords are det Evenly Distributed 
Across the Backends 

Wure vime complexity formulas reflect. the perfect 
conditions for distribution of the records which are to be 
pemeed, Ihey do not give the correct results for the 
@emagicion that one backend contains all the records to be 
sorted and the other backends do not contain any records. In 


o 


such a case there are two alternatives to be considered. The 


2 


first is that the backend naving the records ™ perforns ea 
sort function without redistribution of records. (tees -eaee 
1s that the records are redistributed evenly among the 
backends. In the following sections we will examine tnese 
two alternatives inederale, 
a. The Backend Performs tne Sore tuncemen 

Assuming that tne algorithm 1nenapees LV 
section C.2 has been selected for impl«enenting sort funcuaae 
in MDBS, we will calculate the time oa-mplexities for iGae 
sort funetion. The backend now contains (B*b) blocks. So, 


the internal sort phase time complexity is 
Oi. B®) smog 5 


and 2*B*b accesses to the secondary memory are required 


which is 
Ox B*be). 


. 
The merge process requires the time O(B*b*¥r* (log (B¥b)| ) 
with the access time to the secondary memory O(B*b* log 
(B¥b)]). 


Therefore, the sort finetion time complexity is 
0 ( B¥b*r*(log r + log (B¥*D)) ), 
and the required accesses to the Secondary Memory sare 


0 ( B¥b* [log(B¥*b)| ). - . 
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b. The Records are Distributed Evenly Among the 

Otner Backends 

ine this —alvemiauives=tne backends should inform 
mae conmtrolléer if thay do not have any records to be sorted. 
Mine controller then manages the transfer of the records from 
eme backend to the other backends. 

Sinee one. bacvendmecmratnsm Ss bom blocks ae biCB- 1) 
mecords are to be transmitted to the other backends. This 
requires communication time of OfB*b). The backends now 
Pexreain equal number of Dlocks, »b. The time complexities can 
eer De calculated as in the Chapter IV section C.2. 

The internal sort process time is O(b*r*log r). 
Tne merge process time at the backends is O( b*r*log b ). 
Accesses required at the backends are 0 ( b¥(1+log 5) ). 

Depending on the average time required tO 
ePansmit a block from a backend to another backend, we can 
analyze the difference between the aforementioned 
alternatives. At this moment we do not know the value of the 
Transmission time a block. Clearly, there are some cases in 


which the transmission is not cost-effective. 
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VI. INTRODUCTION TO Tita 


In this part of the tnesis, we investigate possible swas 
of implementing the join operation in MDBS. We consider now 
the functions of the join operation Can be distr ilouycdueeas 
the controller and the backends. Again, we wish to Cake we 
possible advantage of the parallelism inherent in the MDBS 
hardware and software architecture. We also wish to adhere 
to the design goals of MDBS, in particular tne nilaimizapeee 
of the controller function and messace 27 aace 

In this chapter, we define the terninology and notation 
which we will use in our analysis, and make some simplifying 
assumptions. In Chapter VII, we econsider alternative 
distributions of the functions of fe join@operav tomas 
the controller and the backends. We examine an alternative 
join algorithm, a sort-and-match algorithm, in Chapter Vii 
Finally, a recommendation for implementation 15 ¢1Veniiaems 


Cnapt cea 


A. TERMINOLOGY AND NOTATION 

First, let us define some terminology. A join  invoiee 
two relations, the source relation and the target relation. 
The join is formed over an attribute (or attributes) Sea 
belong both to the source relation and to the the target 


relation. We will call these the source attribute(s) and the 
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Wome uur ibutet(s), respeevively. The domains of the source 
Bete ioutre s) must be the same as the domains of the target 
attribute(s). 

There are many types of joins. First we examine the 
natural join. Revo Some ee aes Owl | Usciiaite Lae 
feral join. The relations particinr=ting in a natural join 
are given in Figure 6.1.a. Relation S, the source relation, 
Memsists of three-tuples of attributes, Lt oe lr Cl mG, 
fmweabion 7, the target relation, consists of three-tuples of 
mer ioutes, B, C, and OD. The assumption is made that 
attributes having the same name are defined over the same 
domain of values. Thus, the attributes B and C in relation 
S are assumed to be drawn from the same domain of values as 
Meee attributes B and C in relation T. Figure 6.1 shows the 
Mmemes product SxT of relations S and T, SxT. > ee ome ole we a 
myeconcatenating each tuple of relation 3S with every tuple 
meeerelation T. 

mies matural join is formed in twe steps. First select 
meom SXT Dicm==buDleswsuicGn Faat theevatues of Doth columns 
Nheaded by B and both columns headed by C are the same. 
Migere are three such tuples, the first, fifth, and ninth 
eigen in Figure 6.1.(b). The second step is to project from 
Ierse tCuples- one column for each distinct attribute. The 


moult relation, S!/X!T, is shown in Figure 6.1.(c). 
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may 


Natural join Gf tworrelationce 
operators, 


a join operation 
Operadvoms 
between the values of the source attrisduvowe 


BL piger® 5 12 
In general, 
COMpar hsea 
these 


Cif 
Shown in the example above could be specified as the join of 


and the tarcet ateraomec cre 


arithmetic 
relationships 


Any 


Pee slewoere altribute values of Bueand C in S are identical 
Pmeedcmmcibtin@buULe values Of B and C in 7, 1.e., S.B=T.B and 
Pepo-'-©. Woen the equal comparison opegator is used; the 
Hern operation is called an equality join. When any other 
Semparison operator is used, the join operation is called an 
Miecuality jOin-selhe join Operation is associative, so that 
more than two relations may be joined. For example the join 
Mamiece relations S, Iepand U, i= “he same as the join of 53 
mecmieeranid the join of Useand the first joins 

There are a variety of join algorithms. Tne simplest 1s 
the straightforward or nested-loops join. The algorithm is 


shown in Figure 6.2. 


= = ee ee eee eee 


mOemecach vwulple in Ghe "source welation do 
For each tuple in the target relation do 
Per ede - join congue ton nolds: true tnoen 


Form a result tuple 


= oe ee eee 


PL Uinemome nme Obnatenu OhWerdusOll wheor itm 


Pmietemedvaovers wnich follow, we will simplify our 
analysis by assuming that join operations are restricted to 
eoelity joins over a single source attribute and a single 


target attribute. The terms, source relation and target 
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relation, refer to the files participating ~in” Saupe 
operation in MDBS. Yence, the source file reters to aysoumen 
relation, and the target file refers tO a Cargeu relaweeas 


We will also adopt the following novaeitom: 


Cs : The number of records in a Scoupecmiemee 
Ct : The number of records 1n a vtarcer eee 
n : The number of blocks belonging to a SOurce [fie 


a backend, Cs/(B*r). 

m : The number of blocks belonging to a target fileGue 
a bDackenGdmuect/ (orm 

q ; Quotion of the cross-product of a source f11e siame 
a. (Career file which partici pare weaned join 
Operation: 


log: Logaritom to>tme basen-= 


Bs. ASSUMP TRONS 
In analyzing the alternatives of the distrioutionsmiem 
the join function, we make the following assumptions. 
1) The source and target records are distributed equally 
across the backends. 
2) The join operation is an equality join over ae single 
source attribute and a Single target attribute. 
3) The join function is performed after the retrieval and 
selection operations specified in the request have been 


performed. 
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mune straigntiorward ©r @esced—100ps join algorithm is 
weed to perform the join. 

5) Accesses to the secondary storage are carried out 
block-by-block. 

wee ine = source sand Starcset “files =) do note contain “any 
muplicate records (i.e., after retrieval of the records 
Merch are to participate in the join operation, there are 
feeetcwo identical “records in the source file or in the 
target file). Therefore, there is no record elinination 


beocess from the files. 


fee A SYNTAX FOR THE JOIN 

iomcm@is SecCvion, we Will Zive a syntax for a 2-way join. 
MDBS utilizes an attribute-based data language, ABDL, for 
user queries. Indeed, an ABDL can be used for any database 
meerications as a Kernel language of any kind of database 
machines. Current database application language queries, for 
instance, SQL, can be be mapped to ABDL requests. 

Using ABDL, a 2-way equality join request is shown as 


mee fOllowing. 
Hegre Cate ouibe wl dSstale quer vat) 


COMME C tee | Carter Toupee see mote | 2) 


(attrioutves Wistec) mm dueny.2) 


eo 


The RETRIEVE clause implies that tne = records yn 
attribute-value pairs given in attribute list 1 Satisty aaa 
conditions given Un quer) ieee a the records whose 
attribute-value pairs given in attribute 11St 2 Savist yi. 
conditions given in query 2gane exec from the 
database. Let R1 and R2 be the two different (aie 
containing these records, respect yc ae 7c CONNEC Tae 
clause specifies the join on the relations 81 and 82 with 
the attributes attribute 1, which is (implicitly fi asim 
explicitly) in attribute list 1, and attribute 20s 


Im (aver Othe is va 


71 


Peel ae eA LERNATIVE DIS@RIBUTIONS OF THE JOIN FUNCTION 


em a a rr me SS SSS 


Mieataiyzing the alvernativesdiStweybutions of Lhe join 
munetion, we velilade again consider three different 
Messibilities. 

Memetoe controller performs the join function 
eer [he backends perform the join function 
fe tne jOln function 1s shared by the controller and tne 
backends. 
Meewill examine each of these alternatives in detail at the 


hollowing sections. 


eee CONT ROERERSPERFORMS THE @JOIN FUNCTION 

In this alternative, the backends perform tne retrieval 
Sieene records which will participate in the join operation. 
Tnese records are then sent to the controller, and the 
semerOller performs the join. 

pence Gach backend contains n source file blocks and m 


meareet file blocks, the communication complexity is 


CGe Cnn). “On 


OCHGESEC EA 


igen, storing these blocks in the secondary storage of the 


@encroller has 


e 


OC “C@SEGt 7 ia 


access, comp lexiey., 

After receiving the records from all backends, the 
controller now has (B*n) source file blocks and (B*m) target 
file blocks. Using the straightforward join algori thmyeaes 
record in the source file is compared with each record in 
the target file in order to form the join. This Peqiiiges 


(Cs*Ct) comparisons. So, the eomoutineg veomo l= 1 tame 
OC “Cso* "Cease 


ASSUMmn Bec e Usene none than one »olock of the source (2 ime 
and one block of the target file are in the primary storage 
at one time, 2*(B¥n*m) accesses to the secondary memory are 
required. In terms of the cardinalities of the sources 


Carget files; this is" the acecssycouple«i aa 


Ox (Cs*Ct)/r ). 


B. THE BACKENDS PERFORM THE JOIN FUNCTION 

In this alternative, we will consider three different 
strategies. In the first, the backends share thew jeuma 
operation equally. In the second, the join functUioneiee 
performed step-by-step at the backends. In the third, a 
Single backend performs the join function with the compere 
source and target files. Let us examine the details of these 


Strategies. 


2 


Teeeue CaCKendssonanve wmemvolm Eaqually = 

Im this strategy, the backends send either source or 
meaieeet. records to each others Let us assume that the target 
records are transmitted between the backends. After 
Weeitsmission of the records, each backend contains Ct target 


meecrds. Next, each backend performs the join function over 


m@eeeowm Part of soUWree records “and ali 


O 


f the target 





meeords. iIhen, the result records from the backends are 
transmitted to the controller. 

Demece CaCO aeuecnd CONLaAINSomwienteareget=erile Dlocks, 
(3*m) target file blocks are transmitted. Therefore, 
oem rexlty of transmitting the target file blocks among the 


backends 1S 


OC B* nis ee or 


COM ra) 


Each backend first stores (B*m) target file blocks 


momen requires access complexity of 
CG C/o 


Bae Oackenad wow ContLalns n source file blocks and 
(B¥m) target file blocks. Tnerefore, the effective computing - 


Somolexity for performing the join is 


DE 
Cisne meat ) 40 Of 


Ores = CtB.-)-. 
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2*B*m*n accesses to the secondary storage are | required ae 


the access complexity is 
OC Cs*Ct7 Ge ar 


Finally, each backend transmits the result records 
to the controller. Let us assume that each backend yiremae 
the same numoer of result records, expressed as a percentage 
q of the cross-production of the records participatvti=aamse 
the join. Then, the number of tne records to be transnitted 
from each backend to themeontrol tensa (q¥B¥n*m¥r) or 
(q*(Cs*Ct)/B8). The communication complexity for transmismeueems 
the result records from B backends to the controller |) 
is 

O Go. Cbs Ct) cee 
2. The Backends Perform the Join Svep- bog 

In this strategy, the join operation iS performs 
step-by-step at the backends. At each step, the number of 
backends involved in the join is reduced by one-half. A 
backend performing the join function sends its source wage 
target records to its neighbor backend. Figure 7.1 dGepileime 
the the flow of records. The total number of steps required 
is log B, where B is the number of backends. 

The arrows indicate the transmission diréecUloneaes 
blocks. At each step, the backends involved first perform 
the join on the portions of the source and tUargev  ) i mies 


available, and send the partial result to the conic imei 
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Figure 7.1. Performing the Join Function Step-by-Step 


at the Backends 
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Next, the Subsets of the Source and target files are"seneuee 
the neighbor backend. 

At each step, the number of blocks to be Cransmurges 
over the broadcast bus is half of the total number of ¥souigag 
file blocks plus half of the total number of Utarae Use 
Dikoe k's) Tae Che communication complexity for los Bs gems 
nS 

O¢ Ces ce y eeaie |). 

At each step, the backends receiving tne source Saag 
target records from their neighbors first store them Sefore 
the join starts. The effective access complexity of storing 


the records at each step is derived ase tome ee 


1. step (17/2) *(Cs26t)/ (e+) 


2 aSeeo i *(Cs+#Ct)/ (BAe 
32 Step 2 *(Cse@r 7 Ge 
4. step 4H * (ESeCe) Caan 
log 8-2] 
toe Wee. 2 *(Cs+Ct)/(B*r) 


Therefore, the total effective access complexity for Svtomiaae 


the records is 


logo) 
OG 72 (CS#Ct)7 Gb ie ae 


a 


EWe =Conoul ne —COMpme erry el Liew jO1 0) 5 derived» as 


following. 


ees GOO) mh cee cemnn ts) n¥m¥*r_ 


este = CAS ie 2c“) 4¥n*m*r 


Bees LC Ome 4 ait) em <i) 16¥n¥m¥r 


\ ogi =! ina bod - oe _ 


log Bi step (2. ¥n¥r)¥(2~  ¥m¥r) ¥n¥m¥r 


t 
NO 


fmeretore, the total effective computing complexity is 
ye g. 
OQWeeeerCs*CE/B ). 


Banee the number of source and target blocks participating 
mimeone join changes at each step, the access complexity 


euring the join is derived as following. 


fas oeo 2*( n¥m ) 

2. step 2*(2n*4m) 

3. step 2% (4n* 4m) 

4. step 2*(8n*8m) 
; loqB-t] —‘feaB-A1 
log B] step 2*(27 n¥2? mm) 


The total effective access complexity Tsing 
‘Deg! oo 
Cue *(Cs*Ct)7(3*i ee 
Only the result records are transmitted VONwiaae 
controller. Since we use q*Cs*Ct to represent the numbenmem 


result records, communicationwcomp le ay ae. 
OC G*(Cs*Ci)7 we 


3. One Backend Pertorms@tne lene auic een 

In this strategy, the source and target recordist 
each backend are transmitted to a designated backend, Wimgae 
Chis = perforiis (thew el ar, Since each backend containsiie 


source file dlocks and m target file bdlocks, (igus 


COMMUN LCabI On "como lex Pies 


OC 8*(n+mn) ), or 


OC ((CseCt ae 


The records sent from the otner dackends are fimst 
stored into the secondary storage of the designated backend. 


This is the access complexity gem 


OCMC s=C ica 
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The designated backend now contains Cs source 
mecords ~and Ct target records. Using a straightforward join 


Boeorithm, the computing= comolexity is 
OCICS SG rarer 


Z 
and 2*B*n*m accesses to the secondary storage are required, 


mete access complexity of 
2 
OCrCs Crys). 


The designated backend produces ~q*Cs*Ct mest 
mecords. Transmission of these result records to . tne 


memgroller nas complexity of 


OCs Ch ma, 


Meeeae CONTROLLER AND THE BACKENDS SHARE THE JOIN FUNCTION 
In this alternative, the controller and the backends 
Pare the join function , and the controller integrates the 
results. Each backend transmits the its part of both the 
eemree records and the target records to the controller. At 
the same time, each backend performs a partial join with its 
Source and target records. In the meantime, the controller 
ferrorms the join function with the sets sent from the 
backends, except for those sets which Pei ned at the 


backends. 
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Let ni,n2,...,nB be the sudSets Of tne sse@urc@cu meee 
mi,m2,...,mMB be the subdsets of the target file saenmee ee 
backend i, Bi, contains tne SsuUcSeUS ieee Tae 
transmission of the whole source and target file 1ntomae 


controller has the communication complexe wen 


OCP Bs (ni) ee Os 


OC. (Cs4 Goo ae 


The controller first stores tne records. JTnis ,oquuee 
the access complexity of O( (Cs+Ct)/r ). 

The partial join function at the bdackend asia 
computing complexity of O ( Cs*Ct/B ), and access 


De 
complexity of O( CseeeL/ (Be =e 


The controller mow convaims ni source Set” Vana mi 
target set. Since the backends perform only part Of mae 
join , the rest of the join function iS pertormed “auar 


controller. This means each ni is compared with mj to outppEe 
the result records such that 1<= 1 “=B and” 1<= <= eee 
ieee This requires B*(B-1) times (n*m) comparisons| 
Therefore, the join function at the controller has compumae 


complexity of 


O( n*¥m*B*(B-1)*r), or 


OC Gs*tt. 


a 
and access complexity of O( B*n*m ), or Of Cs¥*Ct/r ). 
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Dy EVALUATING THE ALTERNATIVE PreSstRESUT OMS OF ae 

PeNCTIONALITY 

In the previous sections we have presented five 
alternative distributions of the functionality of join 
Between the controller and the backends. In this section, we 
will analyze the tradeoffs of the alternatives. Table 3 
Summarizes tne results of the analyses in terms of that’ the 
Eemouting, access, and communication complexities. 

Alternative A represents the distribution of function 
Beesented in Section A of this chapter. The controller 
performs the join function. Alternative B.1 represents the 
meal purion oresented in Section BY) of this cnapter. The 
beekends share the join function equally. Rianernative “pez 
mameesenLs the distribution presented in Section 8.2 of this 
chapter. The backends perform the join function step-by- 
step. meteraative = Bes = represents. Uleme distribution | 8.3 
Beesented in Section B.3 of Cais chapter. Potala 
alternative C represent the distribution C presented in 
ferion C of this chapter. The controller and the backends 
Snare the join function. Let uS examine each of these 
alternatives with regard to the design goals of MDBS. 

Alternative A is clearly contrary to design goal of 
Minimizing controller function. Therefore, we will eliminate 
it from further consideration. Alternative B.1 meets’ the 


moe Of Minimizing controller funetion and distributing the 


= 
a 


work over the backends. The communication complexity is also 
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less than that of either of the other alternatives, 5. 2am 
Dee 

Alternative B.2 meets the design goal of Minimize 
controller function. However, the computing and Saccere 
complexities increase exponentially with the factor of )25amee 
B. This is an especially important consideration Torus 
overhead in the system. In addition, the same blocks wile 
broadcasted log B times over broadcast bus, increasing een 
communication overhead. AS we recall, a Similar procedure 
was proposed in Chapter IV for the sort function. 41owever, 
the characteristic of the join function “do0es 1c Ue 
advantage of this procedure. At each step, the output of the 
backends is wasted, since each record in the source mae 
must be compared with every record in the target file to 
form the join. The same records will be transmitted between 
the bdackends redundantly. Therefore, we will eliminate this 
alternative from further considerations Alternative 8.3 
does not meet the design goal of sharing the work between 
the backends. Furthermore, transmission of source and target 
file blocks into the designated backend increases “tie 
communication overhead. This alternative is also elimiaaweg 
from further consuderatraeur 

Alternative C increases the amount of work which is_ to 
be done by the controller. This is also contrary to desu 
goal of minimizing controller function. (heretow ye wii 


eliminate this alternative from further consitdGderaeiteue 
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AS easily seen from the above explanations, alternative 
Bay. | I the best approach EO CiomeieoortoubLLOnN of 
mpuInetiOonality. AS we recall, a Straightforward join 
algorithm is ULLLI zed to analyze the alternative 
aeestributions. Having decided the best alternative for 
mest lLOULIong the join function with the simplest join 
algorithm, we can improve the efficiency of the chosen 
alternative by using a different algorithm, the sort-match 
meesoritom. This situation will be examined ieee 8 0 1. Owl 12 


emaptver. 
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VIII. AN ALTERNATIVE JOIN ALGORITHM 


In the previous chapter, we analyzed the distribution of 
Maer unctlLons Of the joim Operation in MDBS assuming that a 
esearch tiorward join algorithm Ys used. In the first part 
pameerme tnesis;, “we “discussed how tne sort function can be 
mierementved in MDBS. Asstmimeethat the sort funetion is 
implemented as recommended in chapter IV, we now discuss now 
mies vOlN Operation can be implemented using ae sort-match 


algorithm. 


eee nN VE DEStTRIBUITIONS OF THE JOIN FUNCTION BY USING 

A SORT-MATCH ALGORITHM 

mIcmmusite wa SOrtl=mMatch algorithm, the source records 
geome tne target records are first sorted. Then, the join 
manebion 18 performed. The join can be formed by a simple 
Teeening Of the source attribute values and the target 
attribute values. 

In Chapter IV, we examined how to perform the = sort 
PUnCTiOn at MDBS. As we recall, our proposal was to apply 
m@eealternative C.2 in Chapter IV, the backends sort and 
Perform a partial merge, and the controller performs the 
meget Merge. With such a capability, we propose two 
weeeriavives for distributing the functions of the sort- 


Teed jJOin algorithm among the controller and the backends. 
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The first alternative is as follows. Each Dackend pertomme 
sort and partial merge of the source and target recordee 
Then, each backend broadcasts its target records to all 
other backends. Each backend then joins itsS portion of the 
source records with all of the target records, transmig@uaae 
the results to the contromiew 

The second alternative is the following. The backends 
perform sort and partial merge on the source and target 
records, which are then transmitted to the control lem ae 
controller performs the final merge of the source recov 
and of the target records, and then performs the join of all 
of the source records and all of the target records. Doauwam 
examine each of these alternatives in detail. 

1. Ine Backends Share tieeom 

In this case, both source and target f1ilés are tae 

sorted at the backends separately. Using a comparison-based 
sorting algorithm, the effective computing complexity Ofayeme 
internal sort pnases of both Cs/B source and Ct/8 Gameee 


records 1s 


Om ((Cs+Ct) Zaye Joma ne 


o*(n+m) accesses to the secondary storage are required. So, 
the effective access complexity during tne internalwsome 


phases of both source and target files 15s 


O( (Cs+Ctey Be ‘ 
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Assuming that tne merge phase is also performed to 
Memomere tne Sorting Of both files, the effective computing 


complexity of the merge phase at the backends is 


O( n¥r¥log ni + m*¥r¥*log ce or 


O( (Cs/B)*log(Cs/(B*r))| + (Ct/B)¥#log(Ct/(B*r) )] ) 


2%*(n¥*log n+ m*log m ) accesses to the secondary storage are 
Mmeeutred to Complete the merge function. Therefore, tne 


effective access complexity for the merge at the backends is 


@O¢ n¥log nl + n¥llog ml), or 


O ( (Cs/(B*r))¥log(Cs/(B¥r) )| ~ (Ct/(B¥r))*log(Ct/(B¥r) J Ve 


Next, the target records are transmitted between the 


meexends . This is tne communication complexity of 


Ghee) 4 or 


OCC ey ita). 


The target records transmitted from the other 
memexnends ‘are first stored before the join starts. This is 


the access complexity of 
Ore ty 


Each backend now contains n blocks of the source and 
B*m blocks of the target file. That is, each backend has one 
memeor sOUrce file and B runs of target file blocks with the 


memecto Of n and m, respectively. B*m target blocks, then, 
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must be merged by each backend. Assuming that a 2-way merge 
is used, the computing complexity of merging GB*m BlOGKS mae 


backend 18 


O( B*m*r*log Bl), or 


Ox ct #log B}). 


2*B¥*m*log B accesses to the secondary storage are redcuaigeam 
S0, the access complexity required during the mergemau 


target records 18s 
O( (Ct/r) * |log 8 ). 


Finally, each backend performs the join over )i@suae 
source and Ct target records. The effective compuaama 


complexity. of eiiemo imeies 


OC min ( n*r) Bem eon 


OC min! ( *Gs7 8 Goa 


and 2*(max ( n, B*m ) ) accesses to the secondary storage 


are required. This is the access complexity of 


O( max ( CS/MB*r) ) Gr aa 


au 


Fach backend now has a portion of the result 
records. Using the same notations as in the previous 
mdovemetmere are q*(min ( Cs/B , Ct )) result records at 
each backend. The communication complexity of transmitting 


mae Gesult records from each backend to the controller is 
OIG wo edniee Coe, Ch). 


moeiie Coneroller rertorms the Join 
Here, each backend performs the internal sort phase 
een cae partial merge pnase of the its portion of the source 
and the target records, and then transmits tnese records to 
meer controller, The controller first merges the source and 
mameet records separately, and then performs the join on the 
pemrece and the target records. 


MicmcomicchmMemoonpuvilnc Complexity to sort n source 


file blocks and m target relation blocks at the backend is 


OG r*les r mer *loger ), or 


Or GmCGc+ Cl) 4ey log ro). 


2*(n+m) accesses to the secondary storage are required. So, 


im@eserfective access complexity is 
OGCest Gt) / (BX). 


Assuming that a e2-way merge is implemented to 
GOomplete the sort of n source and m target file blocks. So 


the computing complexity of the merge is 
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O¢ a¥*r¥log nl +m¥r¥log ml), or 


O( (Cs/B)#log(Cs/(B*r))l + (Ct/B) lee COuGe en eee 


2*(n*log n + m*log m) accesses to the secondary sterag ous 


required. This 18 the access comple tt om 


i n¥log al + moe mi), On 


O¢ (Cs/(B*r) Mloe (Ge) (ane + (Ct/(B¥r) ¥log(Ct/(B¥r)) ye 


Next, the sorted recordS are tranSmitTed CORN 


controller. So, the communication comple uy aes 


OC B* (nem) Sao 


CON Cec ly eee 


The records are first stored at the controller  beforeauiaee 


join starts. This is the access cGomole «tao 
OC UCCS +4407 we 


The controller now contains B*n blockS of ~SoUimae 
file and B*m blocks of target file. That is) )S.¢ereee 
source file and B runs of target file with the length oc 
and m, respectively. The computing complexity of meneame 


source and target records sSaperately is 


O¢ Bene ae B*m*r*log Bl), or 


OC (Cs+ Ct)*log Bl). 


O*B*(n+m)*log B accesses to the secondary storage “are 


on 


PoOaaedeel its is an access complexity of the merge at the 
Pontroller which is 0¢ (Geen ey tlog Bl). 

Bai thlemeomeol ler performs the join on SOmbed 
source and target files. The computing complexity for the 
fons mine ( B*n*r, B=m*r ) J), or OC min ( Cs, Ct ) ), 
Puce aminex ( B*n, Bade) ) accesseseto the secondary storage 
Meemrequired. Tnlis 1S an access complexity of the join at 


mgemeontroller which fs O( max ( Cs/r, Ct/r) ). 


bee COMPARISONS BETWEEN THE TWO ALTERNATIVES 


Rapes te tllistraves the time complexities for doh 


alternatives, using a sort-match algorithm. Again, tne 
Somoucing complexity, the access complexity, and the 
communication complexity are given separately. The 


Semouving complexity includes the sum of the computing 
Semerexities of the internal sort phase, the merge phase, 
ede the join. 

mem access Complexity includes the sum of the access 
momeeexlvies of the internal sort phase, the merge pnaacs 
mageche join. Finally, the communication complexity shows 
the time required to transmit the source and the target 
records among the backends and between the controller and 
the backends. iMicSmCOnDlextuy sme rmulas Of accesses to ™tne 
Semeonwdary Storage are given only for the additional accesses 
feseosary to complete the join. In other words, accesses to 


the secondary storage to retrieve the records to perform 


Jie 


selection and projection before the join Starts arom 
included. Let us examine the Table 4 row by row comparing 
LHe Two alternatives. 

The computing complexity in the backenasua.. the 
alternative A.1 is larger than the alternative 4 -2"5ug 
each backend in alternative A.1 contains all the tarzecuuamme 
Gecords, On the contrary, the alternative A.2  fhasuae 
computing complexity at the COntr owe. There? Oteem 
alternative A.1 1s better than alternative A.2 with rece 
to meeting .tne “desien~ccalwe. Qe oye 2 contro lila 
PUDC MOTs 

The alternative A.1 requires more accesseS TOE 
secondary storage for the backends than the alternative A.2. 
However, again, the alternative A.2 requires more accesses 
to the secondary storage at the controller. Therefore, 
alternative A.1 is better than alternative A.2 due To 
meeting the design goal of minimizing controller funeiwenwe 

Despite the situation that the alternative A.2 has lower 
EV ad simias saver overhead, this may be negligible when oalanced 
against I/O requirements at the controller. Therefore, we 
Ww lle recommend the alternative A.2, i.e.,the backends 
perform the join, for implementation of the join US ?@ieue 
sort-match algorithm in MDBS. This alternative dest meer 
the design goals of minimizing controller functions 


Sharing the work equally at the backends. 
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C. COMPARISONS BETWEEN THE STRAIGHTFORWARD AND THE SORT- 
MATCH JOIN ALGORITHMS 
iaelesswdeprects thesmsoime complexities for the best 
Peper iacive wsing the straightforward join algorithm and the 
best alternative using the sort-match join algorithm. Let us 
now compare the two alternatives. Let us assume that the 


number of source records iS equal to the number of target 


memrtedcemrl,e-s, Cse=) or. Let the block size, r, be equal to 
64. We will compare the access complexities = and the 
computing eompLexities Stee meewO alternatives with 
seomected number of records involved, Cs and Ct, the result 


PeopenvliOnality, gq, and varying the number of odoackends, 8. 
Figure 8.1 shows the access complexities for eet cee and 
eee oi and number of backends, B, from 2 to 15. The 

increasing number of backends nas little effect on access 

complexity when a sort-match algorithm is used. However, 
when the straightforward algorithm is used, the access 
complexity decreases sharply as the number of backends 
increases. Note that for a large number of backends, §8>16, 
the reduction becomes negligible. The access complexity 
required for the sort-match algorithm is always less’ than 
meee required for the straightforward euegomichnm, ~and is 


substantially less for a smaller number of backends. 
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Figure 8.2 snows the computing complexity for both 
pee Oonnvaims — With Cs=Ct=2 and ae gi on) tals) case both 
alternatives have decreasing computing complexity. Again, 
mae Computing complexity for the sort-match algorithm is 
less than that required for the straightforward Jena 
fezorichm, and substantially less for asmall numder of 
backends. When the number of source and target records 
increase, the difference between the two algorithms also 
increase. 


Figure 8.3 shows the communication complexity of the 


meee htiorward jommealgorrrhnm’ with Cs=Ct= 2 , 2, and 2, The 
Meewron, Gg, ranges from 0.1 to 0.5. Figure 8.4 depicts the 
Seme complexity for the sort-mateh join algorithm with 
mect-2 - 2 ama d=O0.1-0.5. These two figures illustrate 
Mee iinereasing Cs, Ct, and q affect on the communication 


Semebexity of the straightforward join algorithm more _ than 


meee SOrt-match join algorithm. 


D. RECOMMENDED PROPOSAL FOR THE DISTRIBUTION OF THE JOIN 
OPERATION 
In the previous sections, we have analyzed the 
fmeernatives of the distribution of the functionality and 
shown the tradeoffs and the advantages of each one by using 
two different join algorithms, namely the straightforward 


Meme algorithm and the sort-match join algorithm. 
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Briefly, alternative 68.1 in Chapt aaa san WS 2m a 
Straightforward join algorithm and the alvernacvive saa 
Chapter VIII using a sort-match join algorithm are tihewee ee 
alternatives for distribution of the functlonalicy . ya 
alternatives, the functional unit performing rae join in 
MDBS is the backends. Finally, comparisons between these two 
alternatives have shown that the alternative A.1, join at 
the backendsS uUSing a SOry=Matcnh join algorithm, 1S OG tiem 
than the alternative B.1, join at the backends Using 
Straightforward join algorithm, on account of neevutieeae 
design goal of minimizing the cominunication overhead between 
the controller and the backends. 

Having analyzed all the alternatives, the most 
aporopriate choice for implementing the join in MD8BS Ys uw 
each backend performs a partial join with its porvloniem 
source records and all target records. Then,the 1 ,esmeaauas 
sent to the controller. The controller will then forward 
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[TAs “CONCLUS Te 


In this thesis, we introduce) th—=meo ap and joie 
operations into the Multi-RBackend Database System (MDBS). 
Adding sort and join eanpadililt lesayna. Ldcr ease Loe 
effectiveness oa be systen in Supporting 72 1) eigaeee 
database and relational language interfaces. The key issu3 
for alternatives is the way in which the functionally yaem 
the operation is distributed among the controller andgiimae 
backends. We have observed that, in each case, that 
assigning tne most of tne wor to tne backends is alwayoauewe 
better approach. Since the work 1s shared equally by ths 
backends, increasing the number of backends in the system 
reduces the response time and increases the throughput, thus 
meeting the design goals. The selected solutions may also be 
implemented with less impact on the existing software. 

Our proposal for the sort function is that the Dackemam 
perform the sorting and partial’merge, and the controumeg 


eo 


Der LORMNS acne. ii ial ie mec. Our proposal for the jiguaa 








function using the sort-match join algorithm 1s thames 


backend performs a partial join with its portion Of soul. 


a ean memercceaicvenbe i 9 eM EP ED 


records and all target records. Then, the results (age 


ct 


the controller. The controllers. then forward ge 


mmm = mm mm em re mw ew Te ee 


eee meee 


Pina Cesw ine to the host computer. 


noe 


(ueemear Or iuirbner refinement concerns the designation 
ot source and ‘target Bein OWS erer Lne jJOoLn function. For 
Supponalysis, we assume thay the number of the source and 
meme ete records are equal. If tnis assumption changes, then 
Dae communication complexity and the access complexity of 
Our proposals will be affected. Clearly, transmitting the 
small number records decreases the communication complexity. 
mi@emeetrect On access complexity is less clear. Tne access 
Semolexity for tne straightforward join a gor ream is 
Power ve  tOe Lae Size of the file resident in main memory. 
Therefore, it may be desirable to select the larger of the 


two files as the file to be transmitted, 


This thesis provides’ the groundwork fone ffir 
analysis. We have presented Computing. access, and 
communication complexities separately. If some relative 


weights can be assigned to these complexities, further 
analyses to evaluate the tradeoffs may lead to providing a 
choice among several alternatives, depending on the 
distribution of the relevant records among tne bdackends, the 


communication cost and the acess complexity. 
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